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MOLECULAR DOCKING TECHNIQUE FOR 
SCREENING OF COMBINATORIAL LIBRARIES 



TQchnical Field 

The present invention relates in general to screening 
5 combinatorial libraries by identification of binding ligands 
and ultimately pharmaceutical compounds, and more 
particularly, to a high throughput molecular docking 
technique for screening of combinatorial libraries. 

Background of the Invention 

10 With the advent of combinatorial chemistry and the 

resulting ability to synthesize large collections of 
compounds for a broad range of targets, it has become 
apparent that the capability to effectively prioritize 
screening efforts is crucial to the rapid identification of 

15 the appropriate region of chemical space for a given target. 
Since it has been generally observed that hits obtained 
against a given target are clustered in a finite region of 
chemical space, there is reason to believe that given the 
right computational tools it is possible to prioritize 

20 screening efforts such that only libraries containing active 
compounds are interrogated. Effective prioritization tools 
would allow scientists to both obtain leads in a cost 
effective and efficient manner and to test virtual libraries 
against novel targets prior to active synthesis and 

25 bioanalysis, thereby, reducing synthesis costs. With the 
expected flood of new targets becoming available in the 
coming decade, it will be critical to focus screening 
efforts on target appropriate regions of chemical space. 
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There are many challenges to overcome prior to being 
able to develop appropriate library prioritization tools. At 
one extreme are the screens for which there is no structural 
data for the target. In these cases^ QSAR or other data 
5 mining tools are typically the method of choice for 

screening prioritization. At the opposite extreme are the 
structure-based approaches that rely on the availability of 
X-ray structures of the target. Unfortunately, in most 
cases, a crystal structure is not available. With the 

10 advent of proteomics and high- throughput protein 

crystallography, however, it is likely that for a given 
target, a structure of a related protein will be available. 
In these cases, a homology model can be built starting from 
the structure of a related protein, and structure-based 

15 tools could be utilized in conjunction with QSAR or other 
data mining tools. 

When structural information for a target protein is 
available, molecular docking can be a useful tool for 
prioritizing screening efforts (reference; Charifson, P,S,, 

20 ed. Practical Application of Computer-aided Drug Design 

1997, Marcel Dekker: New York, 551; Knegtel, R,M.A. and M. 
Wagener, "Efficacy and Selectivity in Flexible Database 
Docking," PROTEINS: Structure, Function and Genetics, 1999, 
Vol. 37, p, 334-345.; and Debnath, A,K., L. Radigan, and S, 

25 Jiang, "Structure-based Identification of Small Molecule 

Antiviral Compounds Targeted to the gp41 Core Structure of 
the Human Immunodeficiency Virus Type 1," Journal of 
Medicinal Chemistry, 1999, Vol, 42(17), p. 3202-3209). 
Operationally, this means that rather than assaying an 

30 entire collection of compounds, the compounds are first 

docked and ranked via some scoring function, and then only a 
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subset of the compounds, usually the highest ranked, are 
assayed. This approach to prioritizing screening efforts 
usually increases by a factor of 1-10 the number of active 
compounds, i.e,, when compared to a randomly selected subset 
5 of compounds, (see, Charifson, P.S*, et al . , "Consensus 
Scoring: A Method for Obtaining Improved Hit Rates From 
Docking Databases of Three-dimensional Structures Into 
Proteins," Journal of Medicinal Chemistry, 1999, Vol. 
42 (25) , p. 5100-5109) . 

10 The ultimate goal of this invention is to use molecular 

docking as a way to prioritize combinatorial library 

Q. 

^£1 screening efforts, i,e,, rather than ranking individual 

^tj compounds, combinatorial libraries of compounds are ranked, 

III Compounds synthesized through combinatorial methods are 

;^ 15 often quite flexible when compared to typical databases of 
ill compounds used for molecular docking studies. Thus, for a 

'|L, docking procedure to be useful, it should be able to handle 

ip fairly flexible compounds (as many as 10-20 rotatable 

bonds), and it should be extremely fast (on the order of one 
O 20 million compounds a week) . With these constraints in mind, 

a new docking technique has been developed and validated, as 

presented hereinbelow. 



Disclosure of the Invention 



To briefly summarize, presented herein in one aspect is 
25 a method of docking a ligand to a protein. The method 

includes: performing a pre-docking conformational search to 
generate multiple solution conformations of the ligand; 
generating a binding site image of the protein, the binding 
site image comprising multiple hot spots; matching hot spots 
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of the binding site image to atoms in at least one solution 
conformation of the multiple solution conformations of the 
ligand to obtain at least one ligand position relative to 
the protein; and optimizing the at least one ligand position 
5 while allowing translation, orientation and rotatable bonds 
of the ligand to vary, and while holding the protein itself 
fixed. 



In another aspect, a system for docking a ligand to a 
protein is provided. The system includes means for 
10 performing a pre-docking conformational search to generate 

multiple solution conformations of the ligand. In addition, 

O the system includes means for generating a binding site 

^1 ..... 

image of the protein, with the binding site image comprising 

II multiple hot spots; and means for matching hot spots of the 

% 15 binding site image to atoms in at least one solution 

P conformation of the multiple solution conformations of the 

L-,, ligand to obtain at least one ligand position relative to 

l! the protein. An optimization mechanism is also provided for 

q optimizing the at least one ligand position while allowing 

■J 2 0 translation, orientation and rotatable bonds of the ligand 

to vary, and while holding the protein fixed. 

In a further aspect, the invention comprises at least 
one program storage device readable by a machine, tangibly 
embodying at least one program of instructions executable by 
25 the machine to perform a method of docking a ligand to a 
protein. The method includes: performing a pre-docking 
conformational search to generate multiple solution 
conformations of the ligand; generating a binding site image 
of the protein, the binding site image comprising multiple 
30 hot spots; matching hot spots of the binding site image to 
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atoms in at least one solution conformation of the multiple 
solution conformations of the ligand to obtain at least one 
ligand position relative to the protein; and optimizing the 
at least one ligand position while allowing translation^ 
5 orientation and rotatable bonds of the ligand to vary, and 
while holding the protein fixed. 

The docking method presented herein has several 
advantages. First, it is built from several independent 
pieces. This allows one to better take advantage of 
10 scientific breakthroughs. For example, when a better 

conformational search procedure (in the present context this 
;D means more biologically relevant conformers) becomes 

available, it can be used to replace the current 
:|| conformational search procedure by generating new 3-D 

|S 15 databases. Second, this approach to ligand flexibility is 
111 better suited for the class of compounds synthesized through 

combinatorial methods. Compounds from combinatorial 
^ libraries frequently do not have a clear anchor fragment, 

ifi Because finding and docking an anchor fragment from the 

^3 20 ligand are key steps in the incremental construction 

algorithms, these algorithms may encounter difficulties with 
compounds commonly found in combinatorial libraries. 
(Incremental construction algorithms work roughly as 
follows: the ligand is divided into rigid fragments; the 
25 largest of these fragments is docked into the binding site 
of the protein; and the ligand is then rebuilt in the 
binding site by attaching the appropriate fragments and 
systematically searching around the rotatable bonds. The 
procedure is described further in: M, Rarey, B. Kramer, T. 
30 Lengauer, & G. Klebe, "A fast flexible docking method using 
an incremental construction algorithm", J. Molecular 
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Biology, 261 (1996), pp. 470-489; and S. Makino & I, Kuntz, 
"Automated flexible ligand docking method and its 
application to database search", J, Computational Chemistry, 
18 (1997), pp. 1812-1825.) Docking entire conformations 
5 overcomes this difficulty. In addition, including an 

efficient flexible optimization step removes a significant 
burden from the conformational search procedure. Further 
improvements in energy minimization algorithms can also be 
taken advantage of, as they become available. 

10 The approach herein to ligand flexibility could be 

viewed as a liability because of a reliance on an initial 

:f| conformational search. As indicated previously, in order to 

achieve maximum efficiency the conformational search should 

!J| be performed once for an entire library or collection and 

15 the resulting conformations stored for future use. For 

\Ji large collections, this would be a considerable investment 

in both computer time and disk space. Because a database 
will typically be used many times, the initial computer time 

ill for the conformational search can easily be justified. 

Q 20 Moreover, with the availability of parallel computers and 
faster CPUs, the conformational search can be completed or 
occasionally redone in a reasonable amount of time. Since 
disk sizes are now approaching the tera-byte level, storing 
the conformations for millions of compounds presents no 
25 problem. 

Brief Description of the Drawings 

The above-described objects, advantages and features of 
the present invention, as well as others, will be more 
readily understood from the following detailed description 
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of certain preferred embodiments of the invention, when 
considered in conjunction with the accompanying drawings in 
which: 

FIGS, lA-lC conceptually depict protein-ligand complex 
5 formation; 

FIG, 2 is a flowchart of one embodiment of a molecular 
docking approach in accordance with the principles of the 
present invention; 

FIG. 3 is a flowchart of one embodiment of a molecular 
10 conformational search procedure which can be employed by the 
docking approach of FIG. 2, in accordance with the 
principles of the present invention; 

FIG. 4 is a flowchart of one embodiment of establishing 

a binding site image for use with the molecular docking 

15 approach of FIG. 2, in accordance with the principles of the 
present invention; 

FIG. 5 is a flowchart of one embodiment of a matching 

procedure for use with the molecular docking approach of 

FIG. 2, in accordance with the principles of the present 
20 invention; 

FIG. 6 is a flowchart of one embodiment of an 
optimization stage for optimizing ligand positions within 
identified matches for use with the molecular docking 
approach of FIG. 2, in accordance with the principles of the 
25 present invention; 
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FIG. 7 graphically depicts a hydrogen bonding potential 
and a steric potential for use in atom pairwise scoring in 
accordance with the principles of the present invention; and 

FIG. 8 depicts one embodiment of a computer environment 
5 providing and/or using the capabilities of the present 
invention. 



Best Mode for Carrying Out the Invention 



The docking procedure discussed below is based on a 

conceptual picture of protein-ligand complex formation (see 
=0 10 FIGS. lA-lC) , Initially, the ligand (L) adopts many 

conformations in solution. The protein (P) recognizes one 
^H; or several of these conformations. Upon recognition, the 

ligand, protein and solvent follow the local energy 

landscape to form the final complex. 

15 This simple picture of protein/ligand complex formation 

ij^ is converted into an efficient computational model in 

accordance with an aspect of the present invention, as 
follows. The initial solution conformations are generated 
using a straightforward conformational search procedure, 

20 One might view the conformational search part of this 

technique as part of the entire docking process, but since 
it involves only the ligand, it can be decoupled from the 
purely docking steps. This is justified since 3-D databases 
of conformations for a collection of molecules can readily 

25 be generated and stored for use in numerous docking studies 
(for example, using Catalyst, see A. Smellie, S.D. Kahn, 
S.L. Teig, "Analysis of Conformational Coverage. 1. 
Validation and Estimation of Coverage", J. Chem. Inf. 
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Comput. Sci, (1995) v235, pp285"294; and A. Smellie, S.D. 
Kahn, S,L. Teig, "Analysis of Conformational Coverage. 2. 
Application of Conformational Models", J. Chem. Inf, Comput. 
Sci. (1995) v235, pp295-304) . The recognition stage is 
5 modeled by matching atoms of the ligand to interaction of 
"hot spots" in the binding site. The final complex 
formation is modeled using a gradient based optimization 
technique with a simple energy function. During this final 
stage, the translation, orientation, and rotatable bonds of 
10 the ligand are allowed to vary, while the protein and 
solvent are held fixed, 

I V- 

ifl Most docking methods can be classified into one of two 

loosely defined categories: (1) stochastic, such as 
III AutoDock, (Goodford, P.J., "A Computational Procedure for 

15 Determining Energetically Favorable Binding Sites on 
111 Biologically Important Macromolecules, " Journal of Medicinal 

:L, Chemistry, 1985, Vol. 28(7), p. 849-857; Goodsell, D.S. and 

ip A.J. Olson, "Automated Docking of Substrates to Proteins by 

Simulated Annealing, " PROTEINS: Structure, Function and 
^1 20 Genetics, 1990, Vol. 8, p. 195-202), GOLD (Jones, G., et 
'"^ al., "Development and Validation of a Generic Algorithm for 

Flexible Docking," Journal of Molecular Biology, 1997, Vol. 
267, p. 727-748), TABU (Westhead, D.R., D.E. Clark, and C.W. 
Murray, "A Comparison of Heuristic Search Algorithms for 
25 Molecular Docking, " Journal of Computer-Aided Molecular 

Design, 1997, Vol. 11, p. 209-228; and Baxter, C.A. et al,, 
"Flexible Docking Using Tabu Search and an Empirical 
Estimate of Binding Affinity," PROTEINS: Structure, 
Function, and Genetics, 1998, Vol. 33, p. 367-382), and 
30 Stochastic Approximation with Smoothing (SAS) (Diller, D.J. 
and C. L.M.J. Verlinde, "A Critical Evaluation of Several 
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Global Optimization Algorithms for the Purpose of Molecular 
Docking," Journal of Computational Chemistry, 1999, Vol, 
20(16), p, 1740-1751); or (2) combinatorial, for example, 
DOCK (Kuntz, I.D., et al . , "A Geometric Approach to 
Macromolecular-ligand Interactions," Journal of Molecular 
Biology, 1982, Vol. 161, p, 269-288); Kuntz, I.D., 
"Structure-based Strategies for Drug Design and Discovery," 
Science, 1992, Vol. 257, p. 1078-1082; Makino, S. and I.D. 
Kuntz, "Automated Flexible Ligand Docking Method and Its 
Application for Database Search," Journal of Occupational 
Chemistry, 1997, Vol. 18(4), p. 1812-1825), FlexX (Rarey, 
M., et al., "A Fast Flexible Docking Method Using an 
Incremental Construction Algorithm, " Journal of Molecular 
Biology, 1996, Vol. 261, p, 470-489; Rarey, M., B. Kramer, 
and T. Lengauer, "The Particle Concept: Placing Discrete 
Water Molecules During Protein-ligand Docking Predictions," 
PROTEINS: Structure, Function, and Genetics, 1999, Vol. 34, 
p. 17-28; Rarey M., B. Kramer, and T. Lengauer, "Docking of 
Hydrophobic Ligands With Interaction-based Matching 
Algorithms," Bioinf ormatics, 1999, Vol. 15(3), p. 243-250), 
and HammerHead (Welch, W., J. Ruppert, and A.N. Jain, 
"Hammerhead: Fast Fully Automated Docking of Flexible 
Ligands to Protein Binding Sites," Chemistry & Biology, 
1996, Vol. 3(6), p. 449-462). 

The stochastic methods, while often providing more 
accurate results, are typically too slow to search large 
databases. The method presented herein falls into the 
combinatorial group. This approach is analogous to FlexX 
and HammerHead in that it attempts to match interactions 
between the ligand and receptor. It differs from these and 
most other docking techniques significantly in how it 
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handles the flexibility of the ligand. Most current 
combinatorial docking techniques handle flexibility using an 
incremental construction approach, whereas the technique 
described herein uses an initial conformational search 
5 followed by a gradient based minimization in the presence of 
the target protein. 

A generalized technique of one embodiment of the 
present invention is depicted in FIG, 2. Initially, a 
conformational search procedure 210 is performed for an 

10 entire library or collection, with the resulting 

conformations stored for future use, A binding site image 
is then created using the protein structure 220. A matching 
procedure is performed to form an initial complex by 
initially positioning a given conformation of a ligand as a 

15 rigid body into the binding site 230. Finally, a flexible 

optimization is performed wherein the matches are pruned and 
then optimized to attain the final result 240. Each of 
these steps of a docking approach, in accordance with the 
present invention, is described in greater detail below with 

20 reference to FIGS. 3-6, respectively. 

The Conformational Search Procedure 

For one aspect of the present invention, a 
straightforward yet effective conformational search 
procedure is preferred. A conformational search is performed 
25 once for an entire library or a collection, with the 

resulting conformations stored for future use. If desired, 
the conformational searching can be periodically repeated. 
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Referring to FIG. 3, uniformly distributed random 
conformations are generated allowing only rotatable bonds to 
vary 310. For example, 1,000 uniformly distributed random 
conformations can be generated varying only the rotatable 
5 bonds. The internal energy of each conformation is then 

minimized, again allowing only rotatable bonds to vary 320. 
Internal energy can be estimated, for example, using van der 
Waals potentials and dihedral angle term, reference: Diller, 
D*J. and C. L.M.J. Verlinde "A Critical Evaluation of Several 

10 Global Optimization Algorithms for the Purpose of Molecular 
Docking," Journal of Computational Chemistry, 1999, Vol. 
20(16), p. 1740-1751, which is hereby incorporated herein by 
reference in its entirety. Each conformation can be 
minimized using, for example, a BFGS (Broyden-Fletcher- 

15 Goldfarb-Shanno) optimization algorithm, e.g., reference 
Press, W.H., et al.. Numerical Recipes in C , 2 ed., 1997, 
Cambridge: Cambridge University Press. 994, which is hereby 
incorporated herein by reference in its entirety. 

Conformations with internal energy over a selected cut- 
20 off above a conformation with the lowest internal energy are 
eliminated 330. For example, any conformation with an 
internal energy of 15 kcal/mol above the conformation with 
the lowest internal energy is eliminated. The remaining 
conformations are scored and ranked 340. Conformations can 
25 be ranked by a score defined as: 

Score = Strain -0.1 x SASA 

where SASA is the "solvent accessible surface area" of a 
particular conformation; and "strain" of a given 
conformation of a given molecule is the internal energy of 
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the given conformation minus the internal energy of the 
conformation of the given molecule with the lowest internal 
energy. Conformations within a pre-defined rms deviation of 
a better conformation are removed 350. For example, any 

o 

conformation within an rms deviation of 1^0 A of a higher 
ranked (i.e., better) conformation can be removed. This 
clustering is a means to remove redundant conformations. A 
maximum number of desired conformations, for example, 50 
conformations, are retained at the end of the conformational 
analysis step 360. 

If more than the desired number of conformations remain 
after clustering, then the lowest ranked conformations can 
be removed until the desired number of conformations remain. 

The process of a small molecule binding to a protein 
target is a balance between "solvation" by water versus 
"solvation" by the protein. With this in mind, the solvent 
accessible surface area term can be chosen in analogy with 
simple aqueous solvation models, e.g., reference Eisenberg, 
D. and A.D. McLachlan, "Solvation Energy in Protein Folding 
and Binding," Nature, 1986, Vol. 319, p. 199-203; Ooi, T., 
et al., "Accessible Surface Areas as a Measure of the 
Thermodynamic Parameters of Hydration of Peptides," 
Proceedings of the National Academy of Sciences, 1987, Vol. 
84, p. 3086-3090; and Vajda, S., et al., "Effect of 
Conformational Flexibility and Solvation on Receptor-ligand 
Binding Free Energies," Biochemistry, 1994, Vol. 33, p. 
13977-13988, each of which is hereby incorporated herein by 
reference in its entirety. The key difference in protein 
versus water "solvation" is that water competes for polar 
interactions only, while a protein effectively competes for 
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both polar and hydrophobic interactions. Therefore, for 
purposes of this invention, polar and apolar surface areas 
are treated identically. The choice of 0.1 as a weight for 
the surface area term is somewhat arbitrary, but is 
5 comparable to the weights chosen for surface area based 
solvation models. Ultimately, conformations with more 
solvent accessible surface area are going to be able to 
interact more extensively with a target protein and can, 
therefore, be of somewhat higher strain and still bind 
10 tightly, A more refined ranking system could be used with 
the present invention, but this approach to ranking 
conformations supplies reasonable conformations. 

The Binding Site Image - Locating the Hot Spots 

The binding site image comprises a list of apolar hot 
15 spots, i.e., points in the binding site that are favorable 
for an apolar atom to bind, and a list of polar hot spots, 
i.e., points in the binding site that are favorable for a 
hydrogen bond donor or acceptor to bind. One procedure for 
creating these two lists is depicted in FIG, 4. First, in 
20 order to find the binding site, a grid is placed around the 
binding site 410, By way of example, the grid may be at 
least 20 A X 20 A X 20 A with at least 5 A of extra space 
in each direction, A 0,2 A spacing can be used for the 
grid. Next, a "hot spot search volume" is determined 420, 
25 This is accomplished by eliminating any grid point inside 
the protein. Any point contained in, for example, a 6.0 A 
or larger sphere not touching the protein can also be 
eliminated. The largest remaining connected piece becomes 
the "hot spot search volume". 
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The hot spots can then be determined using a grid-like 
search of the hot spot search volume 430. By way of 
example, a grid-like search is described in Goodford, P,J., 
"A Computational Procedure for Determining Energetically 
5 Favorable Binding Sites on Biologically Important 

Macromolecules/' Journal of Medicinal Chemistry, 1985, Vol. 
28(7), p. 849-857, which is hereby incorporated herein by 
reference in its entirety. To find the apolar hot spots, an 
apolar probe is placed at each grid point in the hot spot 
10 search volume, the probe score is calculated and stored. 

The process is repeated for polar hot spots. For each type 
of hot spot, the grid points are clustered and a desired 
number of best clustered grid points is maintained 440. For 
example, the top 30 clustered grid points may be retained. 

15 The Matching Procedure - Forming an Initial Complex 

Referring to FIG. 5, in order to initially position a 
given conformation of a ligand as a rigid body into the 
binding site, the atoms of the ligand are matched to the 
appropriate hot spots 510. More precisely, in one example, 
20 a triplet of atoms, A^, A2, A3 is considered a match to a 
triplet of hot spots, H^, H2, H3 if: 

i. The type of A., matches the type of H., for each 

j=l,2,3, that is, apolar hot spots match apolar 
atoms and polar hot spots match polar atoms. 
25 ii. D(A3, AJ= D(Hj,Hk)± 5 for all j,k=l,2,3 where D{Aj, 

Aj,) and D(Hj,H},) are the distance from A., to A^, and 
to H]^, respectively, and 5 is some allowable 

o o 

amount of error, e.g., between 0.25 A and 0.5 A. 
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To restate, a match occurs, in one example, when three 
hot spots forming a triangle and three atoms of the ligand 
forming a triangle substantially match. That is, a match 
occurs when the triangles are sufficiently similar with the 
5 vertices of each triangle being the same type and the 
corresponding edges of similar length. The matching 
algorithm finds all matches between atoms of a given 
conformation and the hot spots. Each match then determines 
a unique rigid body transformation. The rigid body 
10 transformation is then used to bring the conformation into 

the binding site to form the initial protein-ligand complex. 

In step 520, each match determines a unique rigid body 
transformation that minimizes 

3 

I(R, T) ^ kj - RAj - t|^ 

j = l 

15 where R is, for instance, a 3x3 rotation matrix and T is a 
translation vector. Again, a rigid body transformation 
comprises in one example, a 3x3 rotation matrix, R, and 
translation vector T, so that points X (the position of an 
atom of the conformation) are transformed by RX+T. Each 

2 0 rigid body transformation, which can be determined 

analytically, is then used to place the ligand conformation 
into the binding site 530. For this aspect of the 
calculation, several algorithms for finding all matches were 
tested. The geometric hashing algorithm developed for FlexX 

25 (see: Rarey, M., S. Welfing, and T. Lengauer, "Placement of 
Medium-sized Molecular Fragments Into Active Sites of 
Proteins," Journal of Computer-Aided Molecular Design, 1996, 
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Vol* 10, p, 41-54, which is hereby incorporated herein by 
reference in its entirety), proved to be the most efficient. 

Optimization Stage 

A single conformation can produce up to 10,000 matches. 
5 In the interest of efficiency, most of these matches cannot 
be optimized, so a pruning/scoring strategy is desired. 
FIG. 6 depicts one such strategy. 

Referring to FIG. 6, initially all matches for which 
more than a predetermined percentage (e.g., 10%) of the 

10 ligand atoms have a steric clash can be eliminated 610. The 
remaining matches are ranked using an atom pairwise score 
described below, with an atom score cutoff of for example 
1.0 620. Use of a cutoff allows matches that fit reasonably 
well with a few steric clashes to survive to the final 

15 round, and the choice of 1.0 is merely exemplary. After 
being ranked, the matches are clustered, and the top N 
matches are selected to move into the final stage 530, where 
N may comprise, for instance, a number in the range of 25- 
100. 

20 Each remaining match is optimized using a BFGS 

optimization algorithm with a simple atom pairwise score 
640. In one embodiment, the score can be modeled after the 
Piecewise Linear Potential (see, Gehlhaar, D.K., et al., 
"Molecular Recognition of The Inhibitor AG-1343 By HIV-1 

25 Protease: Conf ormationally Flexible Docking by Evolutionary 
Programming," Chemistry & Biology, 1995, Vol. 2, p. 317-324, 
which is hereby incorporated herein by reference in its 
entirety) with a difference being that the score used herein 
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is preferably dif f erentiable . For this score, all hydrogens 
are ignored, and all non-hydrogen atoms are classified into 
one of four categories: 



i. Apolar - anything that cannot form a hydrogen 
bond. 

ii. Acceptor - any atom that can act as a hydrogen 
bond acceptor but not as a donor. 



1 3 



10 



iii. Donor - any atom that can act as a hydrogen bond 
donor, but not as an acceptor, 

iv. Donor/Acceptor - any atom that can act as both a 
hydrogen bond donor and an acceptor. 



15 



The score between two atoms is calculated using either a 
hydrogen bonding potential or a steric potential. The two 
potentials, shown in FIG. 7, have the mathematical form 



F(r) 



(1 + ^Xin 



^r^ + ^^linJ 



Vr^ + aR^.^y 



/ 2 2 2\ 

: ^1 , ro ) 



20 



where R^i^, is the position of the score minimum, e is the 
depth of the minimum, a is a softening factor, and 

(r : r^/ ro) is a dif f erentiable cutoff function of r (the 
distance between the pair of atoms) having the properties 
that when r<ri '^^l and when r>ro <P'^0 . Each potential, 
steric and hydrogen bonding, is assigned its own set of 
parameters. The parameters for these potentials can be 
chosen by one skilled in the art via intuition and 
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subsequent testing, but they do not need to be fully 
optimized. Table 2 contains example parameters for the 
pairwise potentials. 



Table 2 





hydrogen bonding 
potential 


Steric 
Potential 


e 


2.0 


0.4 


a 


0.5 


1.5 




3. OA 


4.05A 




3 . OA 


5. OA 


ro 


4. OA 


6. OA 



ip; 10 These potentials are very similar to the 12-6 van der 

Ijf Waals potentials used in many force fields with two 

i; differences. First, the softening factor, a, makes the 

potentials significantly softer than the typical 12-6 van 
\^ der Waals potentials (see FIG. 7), i.e., mild steric clashes 

;ij 15 common in docking runs are tolerated by this potential. In 
Q. spirit, the softening factor implicitly models small induced 

fit effects of the protein which can be important (see, 
Murray, C.W., C.A. Baxter, and D. Frenkel, "The Sensitivity 
of The Results of Molecular Docking to Induced Fit Effects: 
20 Application to Thrombin, Thermolysin and Neuraminidase," 

Journal of Computer-Aided Molecular Design, 1999, Vol. 12, 
p. 547-562, which is hereby incorporated herein by reference 
in its entirety) , and in practice, makes the potential much 
more error tolerant. The second difference is the cutoff 
25 function. This function guarantees that the potential is 

zero beyond a finite distance usually between 5.0 A and 5.0 
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A. This along with some organization of the protein atoms 
significantly speeds up the direct calculation of the score. 

An attempt was made to calculate the scores both 
directly and through precalculated grids. The advantage of 
using the grids is that the score can be calculated very 
rapidly. Grids were found to be 5-10 times faster than the 
direct calculation. The advantage of the direct calculation 
is that effects, such as protein flexibility and solvent 
mobility, can be accommodated more easily. Since using the 
grids did not seem to cause any deterioration in the quality 
of the docking results and since protein flexibility or 
solvent mobility is currently not included, for the results 
presented hereinbelow, the scores were calculated through 
precalculated grids. For the purpose of the BFGS 
optimization algorithm, all derivatives were calculated 
analytically including those with respect to the rotatable 
bonds (see, Haug, E.J. and M.K. McCullough, "A Variational- 
Vector Calculus Approach to Machine Dynamics," Journal of 
Mechanisms, Transmissions, and Automation in Design, 1986, 
Vol. 108, p. 25-30, which is hereby incorporated herein by 
reference in its entirety) . 

Test Results 

To test the docking procedure of the present invention, 
the Gold test set was used (see Jones, G., et al . , 
25 "Development and Validation of a Generic Algorithm for 

Flexible Docking," Journal of Molecular Biology, 1997, Vol. 
267, p. 727-748, which is hereby incorporated herein by 
reference in its entirety) . Any covalently bound ligand or 
any ligand bound to a metal ion was removed because it 



10 



15 



20 
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cannot, at present, be modeled by the scoring function 
described herein. In addition, any "surface sugars" were 
removed as they are not typical of the problems encountered • 
This left a total of 103 cases (see Table 1 below) . No 
5 further individual processing of the test cases was 

performed. (Note that the "Protein Data Bank" (PDB) is a 
database where protein structures are placed. The "PDB 
Code" is a four letter code that allows a given structure to 
be found and extracted from the PDB.) 
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Table 1 



PDB 


Number of 


Minimum 


RMSD of 


PDB 


Nxomber of 


Minimum 


RMSD of 


Code 


Rot Bonds 


RMSD 


Top 




Code 


Rot Bonds 


RMSD 


Top 
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1 . 


4 


1 J_S L. 


D 




1 A'^ 
JL . 4 o 
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0 . 


71 
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lack 


2 
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0 . 


46 
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7 


0.55 
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0 . 


31 
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0.45 
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0. 


53 


Imrk 


2 
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63 
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As expected, the rms deviation between the bound 
conformation (X=ray) and the closest computationally 
generated conformation increases with the number of 
rotatable bonds. In all but 5 cases, at least one 
5 conformation was generated by the conformational search with 
1.5 A rms deviation of the bound conformation. The most 
interesting aspect of the conformational search results is 
that for some of the more rigid ligands, the minimum rms 
deviation was large. For example, there are several ligands 
10 with fewer than five rotatable bonds, but with a minimum rms 
deviation near 1.0 A. This occurs for two reasons. First, 

o 

a clustering radius of 1.0 A in all cases was used. This 
prevented the conformational space of small ligands from 
being sufficiently sampled. However, it is within the scope 
15 of the present invention that a clustering radius dependent 
on the molecule size could be used to alleviate this 
particular problem. The second problem is that a bond 
between two sp^ atoms was always treated as being 
conjugated. Thus, whenever this type of bond is 
20 encountered, it is strongly restrained to be planar. While 
bonds between two sp^ atoms are often conjugated, this is 
clearly an over-simplification. This may be addressed, in 
accordance with the invention by allowing the dihedral 
angles between two sp^ atoms to deviate from planarity. 
25 This deviation can then be penalized according to the degree 
of conjugation. The penalty could be chosen crudely based 
on the types of the sp^ atoms (see, S.L. Mayo, B.D. Olafson, 
Sl W.A. Goddard, "DRIEDING: A Generic Force Field for 
Molecular Simulations", J. Phys . Chem, 1990, Vol. 94, p. 
30 8897) . 
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The Docking Results 

For the docking runs, two different sets of parameters 
were tested to see their effects on the quality and speed of 
the docking runs: one for high quality docking and one for 
5 rapid searches. The key difference between the two sets of 
parameters are the match tolerance and the number and length 
of the BFGS optimization runs. The match tolerance ranges 
from 0,5 A for the high quality to 0,25 A for the rapid 
searches. Note that the larger tolerance the more matches 

10 will be found. Thus, a larger tolerance means a more 

thorough search, while a smaller tolerance denotes a less 
thorough but faster search. For the high quality runs, a 
maximum of 100 matches per ligand were optimized for 100 
steps compared to 25 matches per ligand for 20 steps for the 

15 rapid searches. 

The first problem is to generate at least one docked 
position between a given rms deviation cutoff. Here, 
terminology is adopted that a ligand that is docked to 
within X A of the crystallographically observed position of 

2 0 the ligand is referred to as an X A hit. The rms 

deviations are shown for the high quality runs in Table 1. 
For the high quality runs, 89 of the 103 cases produce at 
least one 2.0 A hit. The numbers drop to 80 at 1.5 A, 63 
at 1.0 A and 26 at 0.5 A. For the rapid searches, 75 of 

25 the 103 cases produce a 2.0 A hit, 65 produce a 1.5 A hit, 
42 produce a 1.0 A hit and 16 produce a 0.5 A hit. In both 
cases, these numbers compare favorably with similar 
statistics from other docking packages that have been tested 
on the Gold or similar test sets (see, Jones, G., et al., 

30 Development and Validation of a Generic Algorithm for 
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Flexible Docking," Journal of Molecular Biology, 1997, Vol. 
267, p. 727-748; Baxter, C.A. et al . , "Flexible Docking 
Using Tabu Search and an Empirical Estimate of Binding 
Affinity," PROTEINS: Structure, Function, and Genetics, 
5 1998, Vol, 1998, p. 367-382; Rarey, M., B, Kramer, and T, 
Lengauer, "The Particle Concept: Placing Discrete Water 
Molecules During Protein-ligand Docking Predictions," 
PROTEINS: Structure, Function, and Genetics, 1999, Vol. 34, 
p. 17-28; Rarey M., B. Kramer, and T. Lengauer, "Docking of 

10 Hydrophobic Ligands With Interaction-based Matching 

Algorithms," Bioinf ormatics, 1999, Vol. 15(3), p. 243-250; 
and Kramer, B., M. Rarey, and T. Lengaeur, "Evaluation of 
the FlexX Incremental Construction Algorithm for Protein- 
Ligand Docking," PROTEINS: Structure, Function, and 

15 Genetics, 1999, Vol. 37, p. 228-241). 

The second problem is to correctly rank the docked 
compounds, i.e., is the top ranked conformation reasonably 
close to the crystallographically observed position for the 
ligand. This is a significantly more difficult problem than 
20 the first. The rms deviation between the top scoring docked 
position and the observed position for the high quality runs 
are given in Table 1. In this case, there is little 
difference between the two sets of parameters. For the high 

o 

quality runs, 48 of the 103 cases produce a 2.0 A hit as 
25 the top scoring docked position. This number drops to 41 at 
1.5 A, 34 at 1.0 A and 10 at 0.5 A. For the rapid 
searches, 45 of the 103 cases produce a 2.0 A hit as the 
top scoring docked position with 41 at 1.5 A, 34 at 1.0 A 
and 10 at 0.5 A. 
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The utility of the scoring function used in this study 
lies less as a tool to absolutely rank the docked 
conformations than as an initial filter to select only a few 
docked conformations • Most of the well docked positions, 
5 i.e., low rms deviations, survive this 10% cutoff. Most of 
the docked positions, however, do not. For the high quality 
runs, on average 74 positions are found, but after the 10% 
cutoff on average only 8 remain. For the rapid searches, on 
average nearly 21 positions are found, but after the cutoff 
10 on average only 5 remain. At this point, the docked 

positions that survive the 10% score cutoff could be further 
optimized, visually screened, or passed to a more accurate, 
but less efficient scoring function. 

For the high quality runs, the average CPU time (e.g., 
15 using a Silicon Graphics Incorporated (SGI) computer R12000) 
per test case is approximately 4.5 seconds. At this rate, 
screening one million compounds with one CPU would take 
about 50 days. For the rapid searches, the average CPU time 
per test case drops to approximately 1.1 seconds per test 
20 case. At this rate, screening one million compounds with 

one CPU would take about 12 days. Because database docking 
is a highly parallel job, multiple CPUs could easily cut 
this to a reasonable amount of time (for example, a day or 
so) . 

25 Some Specific Successful Cases 

In this section, a few of the successful cases are 
shown to demonstrate the strengths of the approach described 
herein to docking small molecules. In all of these cases, 
the results shown are from the medium quality docking runs. 
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The first case is the dipeptide Ile-Val from the PDB entry 
3tpi (see, Marquart, M., et al , , "The Geometry of the 
Reactive Site and of the Peptide Groups in Trypsin, 
Trypsinogen and Its Complexes With Inhibitors," Acta 
5 Crystallographica, 1983, Vol. B39^ p. 480, which is hereby 
incorporated herein by reference in its entirety) . This 
case has no clear anchor fragment and as a result, the 
incremental construction approach to docking might have 
difficulties with this ligand. Our conformational search 
10 procedure produced a conformation within 0.42 A of the 

observed conformation. The rms deviation between the best 
scoring docked position and the observed position is 0.53 
A, 



The second example, with a ligand having 15 rotatable 
15 bonds, is a much more difficult example. It is an HIV 

protease inhibitor from the PDB entry lida (see, Tong, L., 
et al., "Crystal Structures of HIV-2 Protease In Complex 
With Inhibitors Containing Hydroxyethylamine Dipeptide 
Isostere," Structure, 1995, Vol. 3(1), p. 33-40, which is 
20 hereby incorporated herein by reference in its entirety) . 

In this case the conformational search procedure was able to 

o 

generate a conformation with an rms deviation of 0.96 A 
from the bound conformation. The rms deviation for the top 

o 

scoring docked position is 1.38 A, In fact, the top 13 

o 

25 scoring docked positions are all within 2.0 A of the 

o 

observed position with the closest near 1.32 A. 



The final case is an HIV protease inhibitor from the 
PDB entry 4phv (see. Bone, R., et al . , "X-ray Crystal 
structure of The HIV Protease Complex With L-700, 417, An 
30 Inhibitor With Pseudo C2 Symmetry," Journal of the American 
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Chemical Society, 1991, Vol. 113 (24), p. 9382-9384, which 
is hereby incorporated herein by reference in it entirety) ♦ 
The ligand in this case has 12 rotatable bonds. This 
clearly demonstrates the value of including the final 
flexible gradient optimization step of the ligand. The 
closest conformation produced from the conformational search 
procedure is 1.32 A from the crystallographically observed 
conformation. With an rms deviation of 0.38 A, the top 
scoring docked position is also the closest to the observed 
position. The smallest rms deviation that could have been 
obtained without the flexible optimization is that of the 
closest conformation generated by the conformational search 
procedure, i.e., 1.32 A. Thus, in this case, the flexible 
optimization decreased the final rms deviation by at least 
1.0 A. 

An Analysis of the Errors and Avenues for Improvement 

It is often assumed that when docking simulation fails, 
the score has failed, i.e., the global minimum of the 
scoring function did not correspond to the 
crystallographically determined position for the ligand. 
Since the docking problem involves many degrees of freedom, 
it is reasonable to believe that in many cases the failure 
can be attributed to insufficient search. It is the goal of 
this section to identify the cause of failure in the cases 
in which the procedure described herein performed poorly. 

To classify docking failures as either scoring failures 
or search failures, the ligand was taken as bound to the 
protein and a BFGS optimization was performed. If the 
resulting score was significantly less than the best score 
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found from the docking runs, the failure is classified as a 
search failure • Every other failure is classified as a 
scoring failure • 



The vast majority of the cases qualify as moderate 
5 scoring errors, i,e., the global minimum appears not to 

correspond to the crystallographic position of the ligand, 
but the percent difference between the global minimum and 
the best score near the crystallographic position of the 
ligand is less than 10%. In these cases, it is difficult to 

10 decide which aspects of the score are failing, but it is 
reasonable to believe that many of these cases can be 
corrected simply by including some more detail in the 
scoring function, such as angular constraints on the 
hydrogen bonding term or a solvation model. There are, 

15 however, a few cases with dramatic scoring errors. These 
cases provide some insight into the weakness of the score 
and the complexities of protein/ligand interactions. 

The case Iglq (see, Garcia-Saez, I,, et al . , "Molecular 
Structure at 1.8 A of Mouse Liver Class pi Glutathione S- 

20 Transferase Complexed With S- (p-Nitrobenzyl ) Glutathione and 
Other Inhibitors," Journal of Molecular Biology, 1994, Vol. 
237, p. 298-314) pointed out the main weakness of the score 
used in this study - hydrogen bonding patterns. This is a 
polar ligand. The top ranked position for this ligand 

25 scores very well largely because there are many "perceived" 
hydrogen bonds. In reality, these hydrogen bonds would be 
extremely weak because the angular dependence of the 
interaction is poor. Moreover, the sulfur atom in the X-ray 
position is accepting a hydrogen bond from the OH of a 

30 tyrosine and the carboxylic acid is involved in a salt 
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bridge with a lysine. Neither of these interactions was 
recognized by the scoring function described herein. 



In the case live (see, Jedrzejas, M,J., et al,, 
"Structures of Aromatic Inhibitors of Influenza Virus 
5 Neuraminidase," Biochemistry, 1995, Vol. 34, p. 3144-3151), 
the correct position receives a relatively poor score 
largely due to the estimated strain of the observed 
conformation. The present invention recognizes certain 
bonds as being conjugated. Thus, a stiff penalty is applied 
10 when these bonds are not planar. In the observed 

conformation, the dihedral angles are all nearly 80° from 
i£| planar. If these dihedral angles are forced to be near 0°, 

the conformation is no longer compatible with the observed 
;J| interactions between the ligand and the protein. It would 

15 be difficult for any docking algorithm to predict these 
values for the dihedral angles. 

The case Ihef (see, Murthy, K.H.M., et al.. The Crystal 
Iff Structures at 2.2-A Resolution of Hydroxyethylene-Based 

^3 Inhibitors Bound to Human Immunodeficiency Virus Type 1 

20 Protease Show That The Inhibitors are Present in Two 

Distinct Orientations," Journal of Biological Chemistry, 
1992, Vol. 267, p. 22770-22778), an HIV protease inhibitor, 
is perhaps the most interesting of all of the dramatic 
scoring errors. The binding pocket is at the interface of a 
25 dimer with the protein monomers being related through a 

crystallographic symmetry operation. At the C-terminus of 

o 

the ligand, a methyl group is within 2.0 A. These 
interactions would be extremely difficult to predict. Our 
program did come up with an interesting alternate 
30 conformation for the C-terminus of the ligand. This 
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conformation eliminates both the internal and external 
steric clashes and forms an additional hydrogen bond with 
the protein. 

There are two cases that can be classified as 
5 conformational search failures: Ihef and Ipoc. In these 
cases the best conformation produced is 2.1 A and 2 . 3 A, 
respectively. The ligand in the case Ipoc has 23 rotatable 
bonds, and thus, it is very difficult to fully cover its 
conformational space with only 50 conformers. While the 
10 ligand in the case Ihef is also very flexible (18 rotatable 
bonds) , the observed conformation, as described above, also 
has a serious steric clash. Thus, this is, as should be 
expected, a very difficult challenge for any conformational 
search procedure. 

15 Conclusions 

In this application, a new rapid technique for docking 
flexible ligands into the binding sites of proteins is 
presented. The method is based on a pre-generated set of 
conformations for the ligand and a final flexible gradient 

20 based optimization of the ligand in the binding site of the 
protein. Based on the results, this is a robust approach to 
handling ligand flexibility. With relatively few 
conformations (less than 50 per molecule) , usually a 
conformation within 1.5 A of the bound conformation can be 

25 generated. Applying the flexible optimization as the final 
step reduces the number of conformations required while 
maintaining high quality final docked positions. 
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There are opportunities to improve the exemplified 
docking technique. Such improvements also fall within the 
scope of the present invention. For example, the conformer 
generation^ while reasonably successful, should treat small 
5 relatively rigid molecules and large flexible molecules 

differently. Since the conformational space of very large 
flexible molecules is too large to explore thoroughly, a 
Monte Carlo search algorithm is used. In addition, the 
score used to rank the conformations is certainly simplistic 
10 and can be improved. For example, variations of solvation 
models (see, Eisenberg, D. and A.D. McLachlan, "Solvation 
,j Energy in Protein Folding and Binding," Nature, 1986, Vol. 

1 319, p. 199-203; Still, W.C., et al . , "Semianalytical 

I Treatment of Solvation For Molecular Mechanics and 

1 15 Dynamics," Journal of the American Chemical Society, 1990, 
i Vol. 112, p. 6127-6129, both of which are hereby 

incorporated herein by reference in their entirety) would 
likely give better conformations. Finally, a better 
^ treatment of strain, particularly that for rotation about 

i 2 0 bonds between two sp^ atoms, might lead to improved results. 
.}'■ 

In the embodiment exempli fiedyr the algorithm used to 
find the polar hot spots tends to find any hydrogen bond 
donor and acceptor rather than those buried in the binding 
site. Improving the hot spot search routine will not only 

25 increase the quality of the technique, but will also 

decrease the number of hot spots needed and, thus, make the 
technique more efficient. Some available programs, such as 
GRID (see, Goodford, P.J., "A Computational Procedure for 
Determining Energetically Favorable Binding Sites on 

30 Biologically Important Macromolecules, " Journal of Medicinal 
Chemistry, 1985, Vol. 28(7), p. 849-857; and Still, W.C., et 
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al., "Semianalytical Treatment of Solvation For Molecular 
Mechanics and Dynamics," Journal of the American Chemical 
Society, 1990, Vol, 112, p, 6127-6129, both of which are 
hereby incorporated herein by reference in their entirety) 
5 or the LUDI binding site description (see, Bohm, H.J., 

"LUDI: Rule-based Automatic Design of New Substituents For 
Enzyme Inhibitor Leads," Journal of Computer-Aided Molecular 
Design, 1992," Vol. 6, p. 693-606, which is hereby 
incorporated herein by reference in its entirety) or a 
10 documented method (see. Mills, J,E,J,, T.D.J, Perkins, and 

P.M. Dean, "An Automated Method For Predicting The Positions 
of Hydrogen-bonding Atoms In Binding Sites," Journal of 
m. Computer-Aided Molecular Designs, 1997, Vol. 11, p. 229-242, 

which is hereby incorporated herein by reference m its 
III 15 entirety) would likely show some improvement. In addition, 
^i, separating the polar hot spots into donor, acceptor, ionic, 

*ip etc., hot spots might improve the results. Finally, in a 

;'=r.j practical application, most users would be willing to spend 

=P some time to enhance the image, i.e., eliminate by hand bad 

If: 20 hot spots, and add hot spots where needed. In practice, 
^3 this will significantly improve docking runs. 

In all docking programs, a good score should be 
efficient, error tolerant, and accurate. The score used 
here satisfies the first two qualities. These two 

25 qualities, however, are usually not compatible with the 

third. It appears that this score will still be useful as 
an initial screen after which a more accurate score can be 
applied. Geometric constraints for the hydrogen bonding 
term, recognition of ionic interactions and solvation 

30 effects, and terms for dealing with metals can be introduced 
to improve accuracy. 
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Nonetheless, when a crystal structure is available^ the 
approach of the present invention to molecular docking is 
useful in library screening prioritization. Even with lower 
quality structural information, such as homology model, the 
technique described herein will still provide useful 
information. 

The capability of the present invention can readily be 
automated by creating a suitable program, in software, 
hardware, microcode, firmware or any combination thereof. 
Further, any type of computer or computer environment can be 
employed to provide, incorporate and/or use the capability 
of the present invention. One such environment is depicted 
in FIG. 8 and described in detail below. 

In one embodiment, a computer environment 800 includes, 
for instance, at least one central processing unit 810, a 
main storage 820, and one or more input/output devices 830, 
each of which is described below. 

As is known, central processing unit 810 is the 
controlling center of computer environment 8 00 and provides 
the sequencing and processing facilities for instruction 
execution, interruption action, timing functions, initial 
program loading and other machine related functions. The 
central processing unit executes at least one operating 
system, which as known, is used to control the operation of 
the computing unit by controlling the execution of other 
programs, controlling communication with peripheral devices 
and controlling use of the computer resources. 
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Central processing unit 810 is coupled to main storage 
820, which is directly addressable and provides for high- 
speed processing of data by the central processing unit. 
Main storage may be either physically integrated with the 
5 CPU or constructed in stand-alone units. 

Main storage 820 is also coupled to one or more 
input/output devices 830. These devices include, for 
instance, keyboards, communications controllers, 
teleprocessing devices, printers, magnetic storage media 
10 (e.g., tape, disk), direct access storage devices, and 
sensor-based equipment. Data is transferred from main 
storage 820 to input/output devices 830, and from the 
input/output devices back to main storage. 

The present invention can be included in an article of 
15 manufacture (e.g., one or more computer program products) 
having for instance, computer usable media. The media has 
embodied therein, for instance, computer readable program 
code means for providing and facilitating the capabilities 
of the present invention. The articles of manufacture can 
20 be included as part of a computer system or sold separately. 

Additionally, at least one program storage device 
readable by a machine, tangibly embodying at least one 
program of instructions executable by the machine to perform 
the capabilities of the present invention can be provided. 

25 The flow diagrams depicted herein are just exemplary. 

There may be many variations to these diagrams or the steps 
(or operations) described therein without departing from the 
spirit of the invention. For instance, the steps may be 
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performed in a differing order^ or steps may be added, 
deleted or modified. All of these variations are considered 
a part of the claimed invention. 

Although preferred embodiments have been depicted and 
5 described in detail herein, it will be apparent to those 
skilled in the relevant art that various modifications, 
additions, substitutions, and the like can be made without 
departing from the spirit of the invention and these are 
therefore considered to be within the scope of the invention 
10 as defined by the following claims. 
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Claims 



1 1. A method of docking a ligand to a protein 

2 comprising: 

3 performing a pre-docking conformational search to 

4 generate multiple solution conformations of the ligand; 

5 generating a binding site image of the protein, 

6 said binding site image comprising multiple hot spots; 

7 matching hot spots of the binding site image to 

8 atoms in at least one solution conformation of the 

9 multiple solution conformations of the ligand to obtain 

10 at least one ligand position relative to the protein in 

11 a ligand-protein complex formation; and 

12 optimizing the at least one ligand position while 

13 allowing translation;, orientation and rotatable bonds 

14 of the ligand to vary, and while holding the protein 

15 fixed. 

1 2. The method of claim 1, wherein said performing the 

2 pre-docking conformational search comprises creating a 

3 database of the multiple solution conformations and storing 

4 said three-dimensional database for subsequent use by said 

5 matching. 

1 3, The method of claim 2, wherein said database of 

2 the multiple solution conformations comprises a 

3 conformational database of a combinatorial library. 
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1 4. The method of claim 1^. wherein said performing the 

2 pre-docking conformational search comprises: 





3 


randomly generating a plurality of uniformly 




4 


distributed conformations of the ligand; 




5 


minimizing a strain of each conformation of the 




6 


plurality of uniformly distributed conformations; 




7 


using the strain and a solvent accessible surface 




8 


area of each conformation to rank the conformations; 


i J,; J. 


9 


and 


; n 
ifl 


10 


clustering the conformations and retaining a 




11 


desired number of top clusters of conformations, said 




12 


retained number of top clusters of conformations 




13 


comprising said multiple solution conformations of the 




14 


Uganda 



1 5. The method of claim 1, wherein said generating the 

2 binding site image includes at least one of creating a list 

3 of apolar hot spots identifying points in the binding site 

4 that are favorable for an apolar atom to bind, and 

5 generating a list of polar hot spots identifying points in 

6 the binding site that are favorable for a hydrogen bond 

7 donor or acceptor to bind. 

1 6. The method of claim 5, wherein said generating the 

2 binding site image further comprises: 

3 placing a grid around the binding site of the 

4 protein; 
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5 determining a hot spot search volume using said 

6 grid; 

7 determining hot spots using a grid-like search of 

8 the hot spot search volume; and 

9 for each type of hot spot, clustering the hot 

10 spots and retaining a desired number of clusters of hot 

11 spots with best scores, said desired number of clusters 

12 comprising said multiple hot spots to be employed by 

13 said matching. 

1 7. The method of claim 1, wherein said matching 

2 comprises: 

3 matching atoms of the at least one solution 

4 conformation to appropriate hot spots of the protein by 

5 positioning the at least one solution conformation as a 

6 rigid body into the binding site image; 

7 defining a match, said match determining a unique 

8 rigid body transformation; and 

9 using the unique rigid body transformation to 

10 place the at least one solution conformation of the 

11 ligand into the binding site of the protein, 

1 8. The method of claim 7, wherein said determining 

2 the unique rigid body transformation comprises determining 

3 the unique rigid body transformation that minimizes: 
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i-,J 



I(R, T) = ^ |Hj - RAj - T 



5 where: 

6 H-, = a j^^ hot spot of the protein; 

7 Aj = a j^^ atom of the at least one solution 

8 conformation; 

9 R = a 3x3 rotation matrix; and 
10 T = a translation vector. 

1 9, The method of claim 1, wherein said optimizing 

2 comprises optimizing multiple protein-ligand complex 

3 formations, said optimizing comprising: 

4 eliminating each ligand position having a 

5 predetermined percentage of ligand atoms with a steric 

6 clash; 

7 ranking remaining ligand positions using an atom 

8 pairwise score with a desired atom score cutoff; 

9 after ranking, clustering the ligand positions and 

10 selecting a top number n of ligand positions; and 

11 optimizing each ligand position of the n 

12 positions, allowing the translation, rotation and 

13 rotatable bonds of the ligand to vary. 
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10. The method of claim S, wherein said optimizing 
comprises optimizing each ligand position of the n positions 
using a BFGS optimization algorithm with a simple atom 
pairwise score, allowing the translation, rotation and 
rotatable bonds of the ligand to vary. 
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1 11. A system for docking a ligand to a protein 

2 comprising: 

3 means for performing a pre-docking conformational 

4 search to generate multiple solution conformations of 

5 the ligand; 

6 means for generating a binding site image of the 

7 protein, said binding site image comprising multiple 

8 hot spots; 

9 means for matching hot spots of the binding site 

;£| 10 image to atoms in at least one solution conformation of 

'ij^ 11 the multiple solution conformations of the ligand to 

III 12 obtain at least one ligand position relative to the 

1^ 13 protein; and 

^L.. 14 means for optimizing the at least one ligand 

\n 15 position while allowing translation, orientation and 

\^ 16 rotatable bonds of the ligand to vary, and while 

i3 17 holding the protein fixed, 

1 12. The system of claim 11, wherein said means for 

2 performing the pre-docking conformational search comprises 

3 means for creating a database of the multiple solution 

4 conformations and for storing said three-dimensional 

5 database for subsequent use by said matching. 

1 13. The system of claim 12, wherein said database of 

2 the multiple solution conformations comprises a 

3 conformational database of a combinatorial library. 
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1 14* The system of claim 11, wherein said means for 

2 performing the pre-docking conformational search comprises: 

3 means for randomly generating a plurality of 

4 uniformly distributed conformations of the ligand; 

5 means for minimizing a strain of each conformation 
5 of the plurality of uniformly distributed 

7 conformations; 

8 means for using the strain and a solvent 

9 accessible surface area of each conformation to rank 

10 the conformations; and 

11 means for clustering the conformations and 

12 retaining a desired number of top clusters of 

13 conformations said retained number of top clusters of 

14 conformations comprising said multiple solution 

15 conformations of the ligand, 

1 15, The system of claim 11, wherein said means for 

2 generating the binding site image includes at least one of 

3 means for creating a list of apolar hot spots identifying 

4 points in the binding site that are favorable for an apolar 

5 atom to bind, and means for generating a list of polar hot 

6 spots identifying points in the binding site that are 

7 favorable for a hydrogen bond donor or acceptor to bind. 

1 16. The system of claim 15, wherein said means for 

2 generating the binding site image further comprises: 
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3 means for placing a grid around the binding site 

4 of the protein; 

5 means for determining a hot spot search volume 

6 using said grid; 

7 means for determining hot spots using a grid-like 

8 search of the hot spot search volume; and 

9 for each type of hot spot^^ means for clustering 

10 the hot spots and for retaining a desired number of 

11 clusters of hot spots with best scores^ said desired 

12 number of clusters comprising said multiple hot spots 

13 to be employed by said matching. 

1 17. The system of claim 11, wherein said means for 

2 matching comprises: 

3 means for matching atoms of the at least one 

4 solution conformation to appropriate hot spots of the 

5 protein by positioning the at least one solution 

6 conformation as a rigid body into the binding site 

7 image; 

8 means for defining a match, said match determining 

9 a unique rigid body transformation; and 

10 means for using the unique rigid body 

11 transformation to place the at least one solution 

12 conformation of the ligand into the binding site of the 

13 protein. 
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3 



18 • The system of claim 17, wherein said determining 
the unique rigid body transformation comprises determining 
the unique rigid body transformation that minimizes: 



I(R, T) - ^ 



H, - RA, - T 



5 where: 

6 = a j^^ hot spot of the protein; 

7 Aj = a j^^ atom of the at least one solution 

8 conformation; 

9 R = a 3x3 rotation matrix; and 
10 T = a translation vector, 

1 19. The system of claim 11, wherein said means for 

2 optimizing comprises means for optimizing multiple protein- 

3 ligand complex formations, said means for optimizing 

4 comprising: 

5 means for eliminating each ligand position having 

6 a predetermined percentage of ligand atoms with a 

7 steric clash; 

8 means for ranking remaining ligand positions using 

9 an atom pairwise score with a desired atom score 
10 cutoff; 
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12 
13 



after ranking, means for clustering the ligand 
positions and selecting a top number n of ligand 
positions; and 



14 means for optimizing each ligand position of the n 

15 positions, allowing the translation, rotation and 

16 rotatable bonds of the ligand to vary. 

1 20. The system of claim 19, wherein said means for 

2 optimizing comprises means for optimizing each ligand 

3 position of the n positions using a BFGS optimization 

4 algorithm with a simple atom pairwise score, allowing the 

Q 

5 translation, rotation and rotatable bonds of the ligand to 

6 vary. 
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1 21. At least one program storage device readable by a 

2 machine, tangibly embodying at least one program of 

3 instructions executable by the machine to perform a method 

4 of docking a ligand to a protein, comprising: 

5 performing a pre-docking conformational search to 

6 generate multiple solution conformations of the ligand; 

7 generating a binding site image of the protein, 

8 said binding site image comprising multiple hot spots; 

9 matching hot spots of the binding site image to 
jfi 10 atoms in at least one solution conformation of the 

11 multiple solution conformations of the ligand to obtain 

ill 12 at least one ligand position relative to the protein; 

^^f 13 and 

:L 14 optimizing the at least one ligand position while 

iit 15 allowing translation, orientation and rotatable bonds 

r^i| 16 of the ligand to vary, and while holding the protein 

Q 17 fixed, 

1 22, The at least one program storage device of claim 

2 21, wherein said performing the pre-docking conformational 

3 search comprises creating a database of the multiple 

4 solution conformations and storing said three-dimensional 

5 database for subsequent use by said matching. 

1 23, The at least one program storage device of claim 

2 22, wherein said database of the multiple solution 

3 conformations comprises a conformational database of a 

4 combinatorial library. 
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1 24. The at least one program storage device of claim 

2 21, wherein said performing the pre-docking conformational 

3 search comprises: 

4 randomly generating a plurality of uniformly 

5 distributed conformations of the ligand; 

6 minimizing a strain and a solvent accessible 

7 surface area of each conformation of the plurality of 

8 uniformly distributed conformations; 

9 using the strain of each conformation to rank the 

10 conformations; and 

11 clustering the conformations and retaining a 

12 desired number of top clusters of conformations, said 

13 retained number of top clusters of conformations 

14 comprising said multiple solution conformations of the 

15 ligand. 

1 25, The at least one program storage device of claim 

2 21, wherein said generating the binding site image includes 

3 at least one of creating a list of apolar hot spots 

4 identifying points in the binding site that are favorable 

5 for an apolar atom to bind, and generating a list of polar 

6 hot spots identifying points in the binding site that are 

7 favorable for a hydrogen bond donor or acceptor to bind. 

1 26. The at least one program storage device of claim 

2 25, wherein said generating the binding site image further 

3 comprises: 
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4 placing a grid around the binding site of the 

5 protein; 

6 determining a hot spot search volume using said 

7 grid; 

8 determining hot spots using a grid-like search of 

9 the hot spot search volume; and 



10 for each type of hot spot, clustering the hot 

11 spots and retaining a desired number of clusters of hot 

12 spots with best scores, said desired number of clusters 

3 

fi 13 comprising said multiple hot spots to be employed by 

1; 14 said matching* 

1 21. The at least one program storage device of claim 
p 2 21, wherein said matching comprises: 

Jl 3 matching atoms of the at least one solution 

4 conformation to appropriate hot spots of the protein by 

iTI 5 positioning the at least one solution conformation as a 

6 rigid body into the binding site image; 

7 defining a match, said match determining a unique 

8 rigid body transformation; and 



9 using the unique rigid body transformation to 

10 place the at least one solution conformation of the 

11 ligand into the binding site of the protein. 

1 28. The at least one program storage device of claim 

2 27, wherein said determining the unique rigid body 
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transformation comprises determining the unique rigid body 
transformation that minimizes: 




where : 

Hj = a j^^ hot spot of the protein; 

Aj = a j^^ atom of the at least one solution 

conformation; 
R = a 3x3 rotation matrix; and 
T = a translation vector. 

29* The at least one program storage device of claim 
21, wherein said optimizing comprises optimizing multiple 
protein-ligand complex formations, said optimizing 
comprising: 

eliminating each ligand position having a 
predetermined percentage of ligand atoms with a steric 
clash; 

ranking remaining ligand positions using an atom 
pairwise score with a desired atom score cutoff; 

after ranking, clustering the ligand positions and 
selecting a top number n of ligand positions; and 
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13 
14 



optimizing each ligand position of the n 
positions, allowing the translation, rotation and 
rotatable bonds of the ligand to vary. 



1 30. The at least one program storage device of claim 

2 29, wherein said optimizing comprises optimizing each ligand 

3 position of the n positions using a BFGS optimization 

4 algorithm with a simple atom pairwise score, allowing the 

5 translation, rotation and rotatable bonds of the ligand to 

6 vary. 
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MOLECULAR DOCKING TECHNIQUE FOR 
SCREENING OF COMBINATORIAL LIBRARIES 

Abstract of the Disclosure 

A high- throughput molecular docking facility is 
5 presented for screening combinatorial libraries to identify 
binding ligands and ultimately pharmaceutical compounds. 
The facility employs a pre-docking conformational search to 
generate multiple solution conformations of a ligand. The 
molecular docking facility includes: generating a binding 
10 site image of the protein, the binding site image having 

'^3 multiple hot spots; matching hot spots of the binding site 

image to atoms in at least one solution conformation of the 
multiple solution conformations of the ligand to obtain at 

^ least one ligand position relative to the protein in a 

■0 15 ligand-protein complex formation; and optimizing the at 
least one ligand position while allowing translation, 
orientation and rotatable bonds of the ligand to vary, and 

U.. while holding the protein fixed. 
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GENERATE UNIFORMLY DISTRIBUTED 
RANDOM CONFORMATIONS ALLOWING 
ONLY ROTATABLE BONDS TO VARY 
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MINIMIZE INTERNAL ENERGY FOR 
EACH CONFORMATION ALLOWING 
ONLY ROTATABLE BONDS TO VARY 
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INTERNAL ENERGY OVER CUTOFF 
ABOVE CONFORMATION WITH 
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PLACE A GRID AROUND THE BINDING SITE 



^ \^ 

DETERMINE A HOT SPOT SEARCH VOLUME 



DETERMINE HOT SPOTS USING A 
GRID-LIKE SEARCH OF THE HOT 
SPOT SEARCH VOLUME 



6 
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FOR EACH TYPE OF HOT SPOT. CLUSTER 

THE GRID POINTS AND RETAIN DESIRED 
NUMBER OF BEST CLUSTERED GRID POINTS 



f'i'9- 



5/8 
1073.060 



^510 



INITIALLY MATCH THE ATOMS OF A LIGAND 
TO THE APPROPRIATE HOT SPOTS 






^520 

> 


USE EACH MATCH TO DETERMINE A UNIQUE RIGID 
BODY TRANSFORMATION THAT MINIMIZES 

3 2 
ItR.T) = E|Hj-RAj-T 


0 


\ 




USE EACH UNIQUE RIGID BODY TRANSFORMATION 
TO PLACE THE LIGAND CONFORMATION 
INTO THE BINDING SITE 
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MATCHES FOR WHICH MORE THAN A 
PREDETERMINED PERCENTAGE OF THE 
LIGAND ATOMS HAVE A STERIC 
CLASH ARE ELIMINATED 



I 



AFTER RANKING. CLUSTER THE 
MATCHES AND SELECT TOP N 



OPTIMIZE EACH REMAINING MATCH 
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OPTIMIZATION ALGORITHIM WITH 
A SIMPLE ATOM PAIRWISE SCORE 
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ROTATION AND ROTATABLE 
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RANK REMAINING MATCHES USING 
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I hereby claim the benefit under 35 U.S.C. Section 119(e) of any United States provisional 



application(s) listed below: 


(Application Serial No.) 


(riling Date) 


(Application Serial No.) 


(Filing Date) 


(Application Serial No.) 


(Filing Date) 



I hereby claim the benefit under 35 U. S. C. Section 120 of any United States application(s), or 
Section 365(c) of any PCT International application designating the United States, listed below and, 
insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
Q United States or PCT International application in the manner provided by the first paragraph of 35 
:2 U.S.C. Section 112, I acknowledge the duty to disclose to the United States Patent and Trademark 
lit Office all information known to me to be material to patentability as defined in Title 37, C. F. R., 
a Section 1 .56 which became available between the filing date of the prior application and the national 
^ Jl or PCT International filing date of this application: 



(Application Serial No.) 


(Filing Date) 


(Status) 






(patented, pending, abandoned) 


(Application Serial No.) 


(Filing Date) 


(Status) 






(patented, pending, abandoned) 


(Application Serial No.) 


(Filing Date) 


(Status) 






(patented, pending, abandoned) 



I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that 
such willful false statements may jeopardize the validity of the application or any patent issued 
thereon. 
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POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorney(s) and/or 

agent(s) to prosecute this application and transact all business in the Patent and Trademark Office 

connected therewith, (list name and registration number) 

Blanche E. Schiller, Esq, - Reg. No. 35,670 

Kevin P. Radigan, Esq. - Reg, No. 31,789 

Philip E. Hansen, Ph.D, - Reg. No. 32,700 

Robert E. Heslin, Esq. -Reg. No. 24,778 

Jeff Rothenberg, Esq. -Reg. No. 26,429 

Martha L. Boden, Esq. - Reg. No. 39,115 

Candice J. Clement, Esq. - Reg. No. 39,946 



Send Correspondence to: Blanche E. Schiller, Esq. 

HESLIN & ROTHENBERG, P.C. 
/fj 5 Columbia Circle 

ill Albany, NY 12203 

Direct Telephone Calls to: (name and teleplione number) 
Blanche E. Schiller, Esq. - (518)452-5600 



Full name of sole or first inventor 
David J. Diller, Ph.D. 




Ssidence 

176 Hickory Corner Road, East Windsor, NJ 08520 



/ . Date 

W2>j2ooo 



Citizensliip 
i3 United States of America 



Post Office Address 

176 Hickory Comer Road 



East Windsor, NJ 08520 



Full name of second inventor, if any 




Kenneth M. Merz, Jr. 




S^^i^^^Tf)^ signature 


f f Date 






yl^^sr^^FTcV' jf^ 




/ 693 Berkiiire Drive, State College, PA 16803 




Citizenship 




United States of America 




Post Office Address 




693 Berkshire Drive 




State College, PA 16803 
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